WITP 5(01).book(WITP_A_315534.fm)

نویسندگان

  • Jaime Arguello
  • Jamie Callan
  • Stuart Shulman
  • Stuart W. Shulman
چکیده

Notice and comment rulemaking is central to how U.S. federal agencies craft new regulation. E-rulemaking, the process of soliciting and considering public comments that are submitted electronically, poses a challenge for agencies. The large volume of comments received makes it difficult to distill and address the most substantive concerns of the public. This work attempts to alleviate this burden by applying existing machine learning techniques to the problem of recognizing citation sentences. A citation in this context is defined as a statement in which the author of the public comment references an external source of factual information that is associated with a specific person or organization. The problem is formulated as a binary classification problem: Is a specific person or organization mentioned in a sentence being referenced as an external source of information? We show that our definition of a citation is reproducible by human judges and that citations can be detected using machine learning techniques with some success. Casting this as a machine learning problem requires selecting an appropriate representation of the sentence. Several feature sets are evaluated individually and in combination. Superior results are obtained by combining feature sets. Syntactic features, which characterize the structure of the sentence rather than its content, significantly improve accuracy when combined with other features, but not when used in isolation. Although prediction Jaime Arguello is a Ph.D. student at the Language Technologies Institute at Carnegie Mellon University. His work focuses on text data mining, information retrieval, and natural language processing. Jamie Callan is a Professor at the Language Technologies Institute, a graduate department in Carnegie Mellon University’s School of Computer Science. His research and teaching focus on text-based information retrieval. His recent research studies advanced search engine architectures, federated search across groups of search engines, adaptive information filtering, text analysis and organization, text mining, and automatic collection of instructional materials for an intelligent reading tutor. His earlier IR research included first generation Web-search systems, integration of text search with relational database systems, and information literacy in K-12 education. Dr. Stuart W. Shulman is Director of the Sara Fine Institute in the School of Information Sciences at the University of Pittsburgh. He is also the founder and Director of the Qualitative Data Analysis Program (QDAP) at Pitt’s University Center for Social and Urban Research, which is a fee-for-service coding lab working on projects funded by the National Science Foundation, the National Institutes of Health, DARPA, and other funding agencies. He has been Principal Investigator and Project Director on related National Science Foundation–funded research projects focusing on electronic rulemaking, human language technologies, coding across the disciplines, digital citizenship, and service-learning efforts in the United States. Dr. Shulman is the Editor-in-Chief of the Journal of Information Technology & Politics. This work was supported in part by the eRulemaking project and NSF grants IIS-0240334 and IIS0429293. We are grateful to the U.S. EPA and the U.S. Department of the Interior’s FWS for providing the public comment data that made this research possible. We thank the coders at the Qualitative Data Analysis Program (QDAP) for their time and effort in producing our gold standard data and for providing the feedback that guided the evolution of our coding scheme. We also thank the anonymous reviewers for their comments. Any opinions, findings, conclusions, and recommendations expressed in this paper are the authors’ and do not necessarily reflect those of the sponsors. Address correspondence to: Jaime Arguello, Carnegie Mellon University, Pittsburgh, PA (E-mail: [email protected]). 50 JOURNAL OF INFORMATION TECHNOLOGY & POLITICS error rate is adequate, coverage could be improved. An error analysis enumerates short-term and long-term challenges that must be overcome to improve recall.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WITP 5(01).book(WITP_A_315126.fm)

In this article, we discuss the design of party classifiers for Congressional speech data. We then examine these party classifiers’ person-dependency and time-dependency. We found that party classifiers trained on 2005 House speeches can be generalized to the Senate speeches of the same year, but not vice versa. The classifiers trained on 2005 House speeches performed better on Senate speeches ...

متن کامل

WITP 5(01).book(WITP_A_315124.fm)

This paper presents the U.S. Election 2004 Web Monitor, a public Web portal that captured trends in political media coverage before and after the 2004 U.S. presidential election. Developed by the authors of this article, the webLyzard suite of Web mining tools provided the required functionality to aggregate and analyze about a half-million documents in weekly intervals. The study paid particul...

متن کامل

WITP 5(01).book(WITP_A_315580.fm)

Many research questions in political communication can be answered by representing text as a network of positive or negative relations between actors and issues such as conducted by semantic network analysis. This article presents a system for automatically determining the polarity (positivity/negativity) of these relations by using techniques from sentiment analysis. We used a machine learning...

متن کامل

Effects of fish meal in beef cattle diets on growth performance, carcass characteristics, and fatty acid composition of longissimus muscle.

We investigated the effects of fish meal (FM) in beef cattle diets on growth performance, carcass characteristics, and fatty acid (FA) composition of longissimus muscle in 63 yearling steers (335 +/- 23 kg). High-moisture corn and alfalfa silage diets were supplemented with either a corn gluten/blood meal mixture or FM at 10% of the diet. Fish meal contained (as-is basis) 5.87 g/kg eicosapentae...

متن کامل

FM 2005: Formal Methods, International Symposium of Formal Methods Europe, Newcastle, UK, July 18-22, 2005, Proceedings

It's coming again, the new collection that this site has. To complete your curiosity, we offer the favorite fm 2005 formal methods international symposium of formal methods europe newcastle uk july 18 22 book as the choice today. This is a book that will show you even new to old thing. Forget it; it will be right for you. Well, when you are really dying of fm 2005 formal methods international s...

متن کامل

Ju n 20 01 Spin state and phase competition in TbBaCo 2 O 5 . 5 and the lanthanide series

A clear physics picture of TbBaCo2O5.5 is revealed on the basis of density functional theory calculations. An antiferromagnetic (AFM) superexchange coupling between the almost high-spin Co ions competes with a ferromagnetic (FM) interaction mediated by both p-d exchange and double exchange, being responsible for the observed AFM-FM transition. And the metal-insulator transition is accompanied b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008